* This do-file runs the regressions that replicate and extend part of Chan et al. (2020)



***The replication folder contains: 
** a) crosswalk from nuts2015 to nuts2018;
** b) latent class analysis results
** c) the two crosswalks needed to map the local authorities (in the Special Licence dataset SN 6666)
** to the NUTS regions 

** It is responsibility of the user to obtain the following:

** 1) SPECIAL LICENCE UKHLS DATA with geographic identifiers for respondents

*** You need to obtain Special Licence dataset SN 6666 - Understanding Society: Waves 1-11, 2009-2020 
*** Follow the instructions at https://www.understandingsociety.ac.uk/documentation/access-data/
*** Then place the folder UKDA-6666-stata in the ChanBrexitReplication subfolder.


*** 2) UKHLS SURVEY DATA with individual and household information

*** You need to obtain the ukhls data following 
*** the instructions provided at
*** https://www.understandingsociety.ac.uk/documentation/access-data/
*** The code here assumes you obtained the UKDA-6614-stata files. 
*** The waves used are wave 8 and waves 1-6. This means files with names up to h. 
*** Place the entire folder UKDA-6614-stata in the ChanBrexitReplication subfolder.

*** IMPORTANT!

*** Before running this Stata do file, you'll need to run Prepare_a_hidp_NUTS_Replication.R, after having placed the 
*** restricted UKDA-6666-stata folder, obtained as per instructions above, in the ChanBrexitReplication folder.
*** That piece of code will create a dataset, ukhls.ids.dta, that associates to every respondent the NUTS3 region
*** of residence. After running, check whether ukhls.ids.dta is created and located in the ChanBrexitReplication subfolder.

use "./ChanBrexitReplication/ukhls.ids.dta",replace

*** This should be a dataset with columns hidp wave NUTS318CD NUTS318NM (created from the R script above)

keep if wave=="h"

gen nuts318cd=NUTS318CD

gen NUTS1=substr(NUTS318CD,1,3)

*drop if NUTS1=="UKN"

merge m:1 nuts318cd using "./ChanBrexitReplication/n15to18cw.dta"

gen NUTS2=substr(nuts315cd,1,4)

gen NUTS_code=nuts315cd

duplicates tag hidp wave, gen(duplo)

replace NUTS_code=NUTS2 if duplo>0

bys hidp wave NUTS_c: keep if _n==1

duplicates tag hidp wave, gen(basicdupli)

replace NUTS_code=NUTS1 if basic>0

drop basic duplo

bys hidp wave NUTS_c: keep if _n==1

drop _me 

save "./ChanBrexitReplication/ukhls.nuts.dta", replace

use "./ChanBrexitReplication/Replication_DB_Regional.dta", clear

keep nuts1 nuts2 nuts3 import_shock 

tab nuts3

rename nuts3 NUTS_code

decode nuts1, gen(n1)

drop nuts1

rename n1 nuts1

save "./ChanBrexitReplication/ChinaN3.dta", replace

use "./ChanBrexitReplication/Replication_DB_Regional.dta", clear

keep nuts1 nuts2 nuts3 import_shock 

decode nuts1, gen (n1t)

drop nuts1

rename n1t nuts1

keep if nuts1=="UKM"

collapse (mean) import_shock , by(nuts2)

rename nuts2 NUTS_code

save "./ChanBrexitReplication/ChinaN2.dta", replace

use "./ChanBrexitReplication/Replication_DB_Regional.dta", clear

keep nuts1 nuts2 nuts3 import_shock 
decode nuts1, gen (n1t)

drop nuts1

rename n1t nuts1

keep if nuts1=="UKM"

collapse (mean) import_shock , by(nuts1)

rename nuts1 NUTS_code

save "./ChanBrexitReplication/ChinaN1.dta", replace

use "./ChanBrexitReplication/ChinaN3.dta",clear

append using "./ChanBrexitReplication/ChinaN2.dta"

append using "./ChanBrexitReplication/ChinaN1.dta"

save  "./ChanBrexitReplication/ChinaShock.dta", replace

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/a_indresp.dta", replace
 
keep pidp a_natid1 a_natid2 a_natid3 a_natid4 a_natid5 a_natid6 a_natid97 a_britid  a_citzn1 a_ukborn
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave1.dta", replace 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/b_indresp.dta", replace
 
keep pidp b_natid1-b_natid97 b_arts2a1-b_arts2freq b_mla3 b_citzn1 b_ukborn
 
**arts kept here even if the paper claims the measure is based on wave 3. 
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave2.dta", replace 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/c_indresp.dta", replace
 
keep pidp c_natid* c_britid  c_citzn1 c_ukborn
 
** arts are not in 3, so take them for 2. 
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave3.dta", replace 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/d_indresp.dta", replace
 
keep pidp d_natid* d_citzn1 d_ukborn
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave4.dta", replace 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/e_indresp.dta", replace
 
keep pidp e_natid* e_arts2a1- e_arts2freq e_mla3 e_citzn1 e_ukborn
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave5.dta", replace 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/f_indresp.dta", replace
 
keep pidp f_natid* f_britid f_citzn1 f_citznyear f_ukborn
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave6.dta", replace
 
use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/g_indresp.dta", replace
 
keep pidp  g_ukborn g_ff_ukborn
 
save  "./ChanBrexitReplication/ReconstructedReplicationData/wave7.dta", replace
 
*** The poverty level will be taken from the household-level data
 
use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/h_hhresp.dta"
/*
To determine relative poverty status, first compute the equivalized household income by dividing 
the total household income by the square root of household size. 
Following a convention in poverty research (Jenkins, 2011), the relative poverty line
is set as 60% of the sample median of the equivalized household income. 
*/
 
*Square root of family size

gen peopleroot =(h_nchoecd_dv +h_nadoecd_dv)^.5

gen equivalized_income= h_fihhmngrs_dv/peopleroot

*Hard-coded the sample median

gen poor = equivalized<.6*1891.063 

keep h_hidp equivalized poor

save "./ChanBrexitReplication/poverty_dummy.dta", replace
 
*** now the main one 

use "./ChanBrexitReplication/UKDA-6614-stata/stata/stata13_se/ukhls/h_indresp.dta", replace
 
keep pidp h_hidp h_eumem h_dvage h_sex h_ethn_dv h_racel_dv h_jbnssec8_dv  h_marstat_dv h_nch* h_qfhigh_dv h_indinui_xw h_citzn1 h_ukborn h_istr* h_pbirthy
 
*** Creat interview date variable
 
replace h_istrtdaty=. if h_istrtdaty<0

replace h_istrtdatm=. if h_istrtdatm<0

replace h_istrtdatd=. if h_istrtdatd<0

gen slash="/"

egen date=concat(h_istrtdatd slash h_istrtdatm slash h_istrtdaty)

gen proper_date=date(date, "DMY")

gen brexit=date("23/06/2016", "DMY")

gen when_interview = proper_date-brexit

gen after=when

replace after=0 if when<0

replace when =0 if when>=0
 
gen age_at_brexit=2016- h_pbirthy
 
*** now here exclude observations with missing data on Leave vote
 
gen leave=h_eumem
 
replace leave=. if leave<0

replace leave=leave-1

tab leave

* merge the previous waves to the Wave 8 (aka "h") master data

forvalues i=1 2 to 7{

		merge 1:1 pidp using  "./ChanBrexitReplication/ReconstructedReplicationData/wave`i'.dta", keep(master match)

		rename _merge merged`i'

}

*** sort out citizenship

foreach var of varlist *citzn1{

		replace `var'=. if `var'<0

}

foreach var of varlis *ukborn{

		replace `var'=. if `var'<0

}

** first, identify all those that are not born in UK, according to wave g

gen not_uk_born= g_ff_ukb==5

replace not_uk_born=0 if g_ff_ukb>0&g_ff_ukb<5

** then, add those in wave h 

replace not_uk=1 if g_ukborn==5

replace not_uk=0 if g_ukborn<5

replace not_uk=1 if h_ukborn==5

replace not_uk=0 if h_ukborn<5

** and now the citizens / naturalized

gen naturalized_dummy=not_uk

foreach var of varlist a_citzn1 b_citzn1 c_citzn1 d_citzn1 e_citzn1 f_citzn1{

		replace naturalized_dummy=`var' if `var'==1

}

gen citizen=not_uk==0|naturalized==1

*** clean up the British identity variable

local i = 1

foreach var of varlist *_britid{

		replace `var'=. if `var'<0|`var'>10

		rename `var' britid`i'

		local i= `i'+1
}

egen british_id =rowmean(britid1 britid2 britid3)

*** create the English identity variable(s)

local i=1

foreach var of  varlist *_natid1{

		replace `var'=. if `var'<0

		rename `var' english_`i'

		local i=`i'+1

}



gen english_recent=english_1

forvalues i=2 3 to 6{

		replace english_recent=english_`i' if english_`i'!=.
		
}


****** now British 

local i=1

foreach var of  varlist *_natid5{

		replace `var'=. if `var'<0

		rename `var' british_`i'

		local i=`i'+1

}

gen british_recent=british_1

forvalues i=2 3 to 6{

		replace british_recent=british_`i' if british_`i'!=.

}



forvalues j=2 3 to 4{

		local i=1

		foreach var of  varlist *_natid`j'{


				replace `var'=. if `var'<0

				rename `var' nat_`j'_`i'

				local i=`i'+1

		}	

}


forvalues j=2 3 to 4{

		gen nat_`j' = nat_`j'_1

		forvalues i=2 3 to 6{

				replace nat_`j'=nat_`j'_`i' if nat_`j'_`i'!=. 

		}

}



rename british_recent british
rename english_recent english

****** also the other foreign

forvalues j=6 97 to 97{

local i=1

		foreach var of  varlist *_natid`j'{


		replace `var'=. if `var'<0

		rename `var' nat_`j'_`i'

		local i=`i'+1

		}	

}


forvalues j=6 97 to 97{

		gen nat_`j' = nat_`j'_1

		forvalues i=2 3 to 6{

				replace nat_`j'=nat_`j'_`i' if nat_`j'_`i'!=.


		}

}




/* Now create the dummies: where "English" is English unless also British 

This procedure yields a fivefold typology: 
(1) British only 
(2) English only 
(3) Welsh, Scottish, or (Northern) Irish only 
(4) British and English
(5) all other combinations
*/

*Count how many identities a person lists

egen total= rowtotal(english british nat_2 nat_3 nat_4 nat_6 nat_97)

*(1) British only 

gen british_only=total==1&british==1

replace british_only=. if british==.

*(2) English only, 

gen english_only=total==1&english==1

replace english_only=. if english==.

*(3)Welsh, Scottish, or (Northern) Irish only, 

gen wsni_only= total==1&(nat_2|nat_3|nat_4)

replace wsni_only=. if (nat_2==.&nat_3==.&nat_4==.)

*(4) British and English, 

gen british_and_english= total==2&(english==1)&(british==1)

replace british_and_english= . if(english==.|british==.)

** and (5) all other combinations
** notice that in the original paper the one omitted is British only

gen all_other=!english_o&!wsni_o&!british_and&!british_o

***** Clean the demographics

replace h_dvage=. if h_dvage<18

gen sex=h_sex==2

replace sex=. if h_sex==.

gen marital_status= h_marstat_dv

replace marital_status=. if marital_status<0 

recode marital_status  1=2 3=5 4=5 6=1

label define marital 1 "single" 2 "couple" 5 "divorced/widowed"
 
label values marital_status marital

gen children= h_nchild_dv

recode children 2=1 

replace children=3 if children>=3 &children!=.

label define chd 0 "0" 1 "1-2" 3 "3+" 
 
label values children chd

gen race = h_racel

recode race -9=. 2=1 3=1 4=1 6=5 7=5 8=5 10=9 11=9 12=9 13=9 15=14 16=14 17=5 97=5

label define race 1 "White" 5 "Other" 9 "Asian" 14 "Black" 
 
label values race race
 
** social status and income
/*
Social class with the sixfold version of National Statistics Socio-Economic Classification 
(NS-SEC).Note: This is a coarsening of  h_jbnssec8_dv
*/

gen class=h_jbnssec8_dv

gen class2=h_jbnssec8_dv

replace class=. if class<0

replace class2=. if class2<=0

recode class 1=2 7=8 

recode class2 1=2 7=8 



 
save "./ChanBrexitReplication/almostthere.dta", replace


keep pidp h_hidp b_arts2a1- b_arts2freq b_mla e_arts2a1- e_arts2freq e_mla h_dvage
 
foreach var of varlist b_arts2a1-e_mla{
 
		replace `var'=. if `var'<0 
 
		label values `var' .
 
		replace `var'=`var'+1
 
}
 
foreach num in a1 a2 a3 a4 a5 a6 a7 a96 b9 b10 b11 b12 b13 b14 b15 b96{
 
		replace e_arts2`num'=b_arts2`num' if e_arts2`num'==.
 
}
 
replace e_mla=b_mla if e_mla==.
 
keep pidp h_hidp e_* h_dv
 
** The following exports the data that can be used to re-estimate the LCA from scratch 
 
saveold "./ChanBrexitReplication/forlatentclass.dta", replace version(12)
  
/* This Stata do file uses the LCA results included with the replication file. 
In case you want to recreate from scratch the categories, you need to run the R script 
LCA_estimation_Replication.R
Please ensure that the numerical labels in the LCA data exported from R are structured so that
1 corresponds to "omnivore", 2 to "paucivore", and 3 to "univore".
 */
 
use "./ChanBrexitReplication/almostthere.dta", replace
 
***Now add the NUTS identifiers

cap drop _me

rename h_hidp hidp

merge m:1 hidp using "./ChanBrexitReplication/ukhls.nuts.dta"

*Merge the China shock

cap drop _me

replace NUTS_code="UKJ13" if NUTS_code=="UKJ12"

merge m:1 NUTS_code using "./ChanBrexitReplication/ChinaShock.dta"

encode NUTS1, gen(NC1)

cap drop _me

drop if pidp==.

merge 1:1 pidp using "./ChanBrexitReplication/LCA.dta"

replace class=0 if class==.

cap drop _me

merge m:1 h_hidp using "./ChanBrexitReplication/poverty_dummy.dta"

label define culturalconsumption 1 "omnivore" 2 "paucivore" 3 "univore"

label values latentclass culturalconsumption

gen agesq =h_dvage^2

replace h_qfhigh_dv=. if h_qfhigh_dv<0

keep if age_at>=19

encode NUTS1, gen(NC)

cap drop consumption*

tab latent, gen(consumption)


*** Regression of Leave vote, with the identity and cultural consumption variables (column 4 table 1)
* to identify the estimation sample of the long regression. 


logit leave import british_and all_ot english_only wsni_only british_id poor  i.class consumption1 consumption2  i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if citizen==1, cluster(NUTS_c)

cap drop in_broken

gen in_broken=e(sample)

*** Baseline: as in column 3 of Table 1 and column 1 of Table A.3, keeping the sample constant

logit leave import  poor  i.class   i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if in_broken, cluster(NUTS_c)

*** Long regression: as in column 4 of Table 1 and column 2 of Table A.3

logit leave import british_and all_ot english_only wsni_only british_id poor  i.class consumption1 consumption2  i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if citizen==1, cluster(NUTS_c)


*** The two columns (2-3) of Table 2, and also columns 3-4 of Table A.3

mlogit latent  import  poor  i.class i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if in_broken , cluster(NUTS_c)


*** Supplementary evidence: Columns 5-6 of Table A.3 
* Separate binary logits (instead of the multinomial logit)

logit consumption1   import i.class  poor  i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if in_broken , cluster(NUTS_c)

logit consumption2    import  i.class poor   i.NC i.race i.children i.marital_s i.h_sex h_dvage agesq i.h_qfhigh_dv [pweight= h_indinui_xw] if in_broken , cluster(NUTS_c)

* close log file

log close
